A Deeper Look at Planning as Learning from Replay
نویسندگان
چکیده
In reinforcement learning, the notions of experience replay, and of planning as learning from replayed experience, have long been used to find good policies with minimal training data. Replay can be seen either as model-based reinforcement learning, where the store of past experiences serves as the model, or as a way to avoid a conventional model of the environment altogether. In this paper, we look more deeply at how replay blurs the line between model-based and model-free methods. First, we show for the first time an exact equivalence between the sequence of value functions found by a modelbased policy-evaluation method and by a modelfree method with replay. Second, we present a general replay method that can mimic a spectrum of methods ranging from the explicitly modelfree (TD(0)) to the explicitly model-based (linear Dyna). Finally, we use insights gained from these relationships to design a new model-based reinforcement learning algorithm for linear function approximation. This method, which we call forgetful LSTD(λ), improves upon regular LSTD(λ) because it extends more naturally to online control, and improves upon linear Dyna because it is a multi-step method, enabling it to perform well even in non-Markov problems or, equivalently, in problems with significant function approximation.
منابع مشابه
A Deeper Look at Experience Replay
Experience replay plays an important role in the success of deep reinforcement learning (RL) by helping stabilize the neural networks. It has become a new norm in deep RL algorithms. In this paper, however, we showcase that varying the size of the experience replay buffer can hurt the performance even in very simple tasks. The size of the replay buffer is actually a hyper-parameter which needs ...
متن کاملOffline Replay Supports Planning: fMRI Evidence from Reward Revaluation
Making decisions in sequentially structured tasks requires integrating distally acquired information. The extensive computational cost of such integration challenges planning methods that integrate online, at decision time. Furthermore, it remains unclear whether “offline” integration during replay supports planning, and if so which memories should be replayed. Inspired by machine learning, we ...
متن کاملMerge Strategies for Multiple Case Plan Replay
Planning by analogical reasoning is a learning method that consists of the storage, retrieval, and replay of planning episodes. Planning performance improves with the accumulation and reuse of a library of planning cases.Retrieval is driven by domain-dependent similarity metrics based on planning goals and scenarios. In complex situations with multiple goals, retrieval may find multiple past pl...
متن کاملOnline Learning with Stochastic Recurrent Neural Networks using Intrinsic Motivation Signals
Continuous online adaptation is an essential ability for the vision of fully autonomous and lifelong-learning robots. Robots need to be able to adapt to changing environments and constraints while this adaption should be performed without interrupting the robot’s motion. In this paper, we introduce a framework for probabilistic online motion planning and learning based on a bio-inspired stochas...
متن کاملIntrinsic Motivation and Mental Replay enable Efficient Online Adaptation in Stochastic Recurrent Networks
Autonomous robots need to interact with unknown, unstructured and changing environments, constantly facing novel challenges. Therefore, continuous online adaptation for lifelong-learning and the need of sample-efficient mechanisms to adapt to changes in the environment, the constraints, the tasks, or the robot itself are crucial. In this work, we propose a novel framework for probabilistic onli...
متن کامل